Efficient 16-bit floating point interval processor for embedded applications

نویسنده

  • Michel Kieffer
چکیده

In the last ten years, interval techniques [1, 2] have allowed original solutions for many problems in engineering to be proposed, see, e.g., [3]. One of the main features of interval techniques is their ability to provide guaranteed results, i.e., with a verified accuracy or which are numerically proved. Consider for example, a bounded-error parameter estimation problem: the value of some parameter vector has to be estimated from measured data using a given model structure and bounded measurement errors. In such a context, one may obtain a set which can be proved to contain all values of the parameter vector that are consistent with the model structure, the measured data, and the hypotheses on the noise. Nevertheless, the application of interval techniques in embedded real-time applications is far less developed. The lack of efficient interval hardware support may be a reason for this slower development. Hardware implementations of interval arithmetic have been mentioned twenty years ago in [4]. Extension of existing hardware platforms have been proposed, e.g., in [5] and [6]. Nevertheless, chip builders were not yet convinced of the usefulness of performing specific adaptation of chips to implement interval analysis. This is why interval analysis is mainly performed by software implementations on generalpurpose processors. Interval computations are however quite inefficiently performed on such processors, since the recurrent rounding mode switchings required by interval computations results in recurrent flushes of the processor pipeline [7]. This specific problem led people to study and design dedicated floating-point units (FPU) well suited to double rounding modes (towards −∞ and towards +∞) [6]. Moreover, in many applications, 32-bit FPU are oversized. Measurements, corrupted by errors, do not require to be processed with such an accuracy and in many cases, smaller FPU with reduced precision may fit the application constraints and provide a satisfying accuracy. Thus, for example, 16-bit floatingpoint computations is an efficient way to tackle both accuracy and dynamic problems encountered in signal and image processing [8], for filtering and convolution-based algorithms. This paper introduces 16-bit floating-point arithmetic adapted to interval computations. The main idea is inspired by [6], which proposed to implement two 32-bit FPU on the 64-bit FPU of a general-purpose processor. Here, similarly, noticing that a 16-bit FPU is smaller than a 32-bit FPU, two 16-bit FPU (managing the two rounding modes required for interval computations) are shown not being much bigger than a single 32-bit FPU. The main advantage is that no rounding mode switching is required, preventing them from flushing the processor pipeline. The implementation of such a 16-bit FPU is performed on the FPGA based NIOS-II soft processor [9, 10], which allows instructions to be added to its instruction set. Customizable processors represent an opportunity to propose efficient and low-cost on-chip interval applications which may be used in embedded applications. To compare the performance of 16 and 32-bit FPUs, an example of source localization using a network of acoustic or electromagnetic sensors is considered. In such network of sensors, power consumption and computational complexity are strong constraints when one is concerned with the increase of operability and autonomy [11]. Distributed interval constraint propagation [12] has been proposed as an efficient and low-complexity solution for source localization using a network of wireless sensors. This talk first presents centralized and distributed source localisation problems and describes solutions based on interval analysis. The architecture of the 16-bit FPU is then presented. Attention is paid to accuracy and dynamic range. Results provided by a 32-bit FPU are compared to those obtained with two 16-bit FPU on realistic simulated data. Then, the hardware implementation on the three targeted architectures (Pentium4, Pentium 4-M, and NIOS-II) is presented and benchmarks for execution time and energy consumption are provided.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Embedded Fuzzy Controller for Industrial Applications

The concept of the fuzzy logic makes feasible the creation of fuzzy controllers with low cost 16 bit microcontroller having the same performance as of controllers realized with more expensive Digital Signal Processor (DSP). In this article the implementation of such a fuzzy controller is proposed for 16 bit microcontroller with fast fuzzyficationinference-defuzzyfication algorithm. Because the ...

متن کامل

Design and Implementation of Complex Floating Point Processor Using Fpga

This paper presents complete processor hardware with three arithmetic units. The first arithmetic unit can perform 32-bit integer arithmetic operations. The second unit can perform arithmetic operations such as addition, subtraction, multiplication, division, and square root on 32-bit floating point numbers. The third unit can perform arithmetic operations such as addition, subtraction, multipl...

متن کامل

A New Class of Floating-point Formats

Writing software for a 16-bit digital-signal processing (DSP) application is difficult. One of the main reasons for this difficulty is that the data formats available on a standard 16-bit compiler and processor do not provide adequate dynamic range or noise performance for many DSP applications at high speed. This problem has been widely recognized, and proposed solutions have been developed an...

متن کامل

Implementation of Deep Convolutional Neural Net on a Digital Signal Processor

In this paper I will discuss the feasibility of an implementation of an algorithm containing a Deep Convolutional Neural Network for feature extraction, and softmax regression for feature classification, for the purpose of real-time lane detection on an embedded platform containing a multi-core Digital Signal Processor (DSP). I will explore the merits of using fixed point and floating point ari...

متن کامل

The QC-2 parallel Queue processor architecture

Queue based instruction set architecture processor offers an attractive option in the design of embedded systems. In our previous work, we proposed a novel queue processor architecture as a starting point for hardware/software design space exploration for embedded applications. In this paper, we present a high performance 32-bit Synthesizable QueueCore (QC-2) an improved and optimized version o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009